Semi-Supervised Learning with Explicit Misclassification Modeling
نویسندگان
چکیده
This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.
منابع مشابه
Generalized mixture models, semi-supervised learning, and unknown class inference
In this paper, we discuss generalized mixture models and related semi-supervised learning methods, and show how they can be used to provide explicit methods for unknown class inference. After a brief description of standard mixture modeling and current model-based semi-supervised learning methods, we provide the generalization and discuss its computational implementation using three-stage expec...
متن کاملCost-Sensitive Self-Training
In some real-world applications, it is time-consuming or expensive to collect much labeled data, while unlabeled data is easier to obtain. Many semi-supervised learning methods have been proposed to deal with this problem by utilizing the unlabeled data. On the other hand, on some datasets, misclassifying different classes causes different costs, which challenges the common assumption in classi...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملMinimax Binary Classifier Aggregation with General Losses
We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification e...
متن کاملOptimal Binary Classifier Aggregation for General Losses
We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003